target motion
CigTime: Corrective Instruction Generation Through Inverse Motion Editing
Recent advancements in models linking natural language with human motions have shown significant promise in motion generation and editing based on instructional text. Motivated by applications in sports coaching and motor skill learning, we investigate the inverse problem: generating corrective instructional text, leveraging motion editing and generation models. We introduce a novel approach that, given a user's current motion (source) and the desired motion (target), generates text instructions to guide the user towards achieving the target motion. We leverage large language models to generate corrective texts and utilize existing motion generation and editing frameworks to compile datasets of triplets (source motion, target motion, and corrective text). Using this data, we propose a new motion-language model for generating corrective instructions. We present both qualitative and quantitative results across a diverse range of applications that largely improve upon baselines. Our approach demonstrates its effectiveness in instructional scenarios, offering text-based guidance to correct and enhance user performance.
CigTime: Corrective Instruction Generation Through Inverse Motion Editing
Recent advancements in models linking natural language with human motions have shown significant promise in motion generation and editing based on instructional text. Motivated by applications in sports coaching and motor skill learning, we investigate the inverse problem: generating corrective instructional text, leveraging motion editing and generation models. We introduce a novel approach that, given a user's current motion (source) and the desired motion (target), generates text instructions to guide the user towards achieving the target motion. We leverage large language models to generate corrective texts and utilize existing motion generation and editing frameworks to compile datasets of triplets (source motion, target motion, and corrective text). Using this data, we propose a new motion-language model for generating corrective instructions. We present both qualitative and quantitative results across a diverse range of applications that largely improve upon baselines. Our approach demonstrates its effectiveness in instructional scenarios, offering text-based guidance to correct and enhance user performance.
CigTime: Corrective Instruction Generation Through Inverse Motion Editing
Recent advancements in models linking natural language with human motions have shown significant promise in motion generation and editing based on instructional text. Motivated by applications in sports coaching and motor skill learning, we investigate the inverse problem: generating corrective instructional text, leveraging motion editing and generation models. We introduce a novel approach that, given a user's current motion (source) and the desired motion (target), generates text instructions to guide the user towards achieving the target motion. We leverage large language models to generate corrective texts and utilize existing motion generation and editing frameworks to compile datasets of triplets (source motion, target motion, and corrective text). Using this data, we propose a new motion-language model for generating corrective instructions.
ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots
Xu, Zhixuan, Gao, Chongkai, Liu, Zixuan, Yang, Gang, Tie, Chenrui, Zheng, Haozhuo, Zhou, Haoyu, Peng, Weikun, Wang, Debang, Chen, Tianyi, Yu, Zhouliang, Shao, Lin
To substantially enhance robot intelligence, there is a pressing need to develop a large model that enables general-purpose robots to proficiently undertake a broad spectrum of manipulation tasks, akin to the versatile task-planning ability exhibited by LLMs. The vast diversity in objects, robots, and manipulation tasks presents huge challenges. Our work introduces a comprehensive framework to develop a foundation model for general robotic manipulation that formalizes a manipulation task as contact synthesis. Specifically, our model takes as input object and robot manipulator point clouds, object physical attributes, target motions, and manipulation region masks. It outputs contact points on the object and associated contact forces or post-contact motions for robots to achieve the desired manipulation task. We perform extensive experiments both in the simulation and real-world settings, manipulating articulated rigid objects, rigid objects, and deformable objects that vary in dimensionality, ranging from one-dimensional objects like ropes to two-dimensional objects like cloth and extending to three-dimensional objects such as plasticine. Our model achieves average success rates of around 90\%. Supplementary materials and videos are available on our project website at https://manifoundationmodel.github.io/.
FLD: Fourier Latent Dynamics for Structured Motion Representation and Learning
Li, Chenhao, Stanger-Jones, Elijah, Heim, Steve, Kim, Sangbae
Motion trajectories offer reliable references for physics-based motion learning but suffer from sparsity, particularly in regions that lack sufficient data coverage. To address this challenge, we introduce a self-supervised, structured representation and generation method that extracts spatial-temporal relationships in periodic or quasi-periodic motions. The motion dynamics in a continuously parameterized latent space enable our method to enhance the interpolation and generalization capabilities of motion learning algorithms. The motion learning controller, informed by the motion parameterization, operates online tracking of a wide range of motions, including targets unseen during training. With a fallback mechanism, the controller dynamically adapts its tracking strategy and automatically resorts to safe action execution when a potentially risky target is proposed. By leveraging the identified spatial-temporal structure, our work opens new possibilities for future advancements in general motion representation and learning algorithms. The availability of reference trajectories, such as motion capture data, has significantly propelled the advancement of motion learning techniques (Peng et al., 2018; Bergamin et al., 2019; Peng et al., 2021; 2022; Starke et al., 2022; Li et al., 2023b;a). However, it is difficult to generalize policies using these techniques to motions outside the distribution of the available data (Peng et al., 2020; Li et al., 2023a). A core reason is that, while the trajectories in the data itself are induced by some dynamics of the system, the learned policies are typically trained to only replicate the data, instead of understanding the underlying dynamics structure. In other words, the policies attempt to memorize the trajectory instances rather than learn to predict them systematically. Moreover, the high nonlinearity and the embedded high-level similarity hinder datadriven methods from effectively identifying and modeling the dynamics of motion patterns (Peng et al., 2018). Therefore, addressing these challenges requires systematic understanding and leveraging the structured nature of the motion space. Instead of handling raw motion trajectories in long-horizon, high-dimensional state space, structured representation methods introduce certain inductive biases during training and offer an efficient approach to managing complex movements (Min & Chai, 2012; Lee et al., 2021). These methods focus on extracting the essential features and temporal dependencies of motions, enabling more effective and compact representations (Lee et al., 2010; Levine et al., 2012). The ability to understand and capture the spatial-temporal structure of the motion space offers enhanced interpolation and generalization capabilities that can augment training datasets and improve the effectiveness of motion generation algorithms (Holden et al., 2017; Iscen et al., 2018; Ibarz et al., 2021).
Development of On-Ground Hardware In Loop Simulation Facility for Space Robotics
Sah, Roshan, Srivastava, Raunak, Das, Kaushik
Over a couple of decades, space junk has increased rapidly, which has caused significant threats to the LEO operation satellites. An Active Debris Removal $(ADR)$ concept continuously evolves for space junk removal. One of the ADR methods is Space Robotics, whose function is to chase, capture and de-orbit the space junk. This paper presents the development of an on-ground space robotics facility in the TCS Research for on-orbit servicing $(OOS)$ like refueling and debris capture experiments. A Hardware in Loop Simulation (HILS) system will be used for integrated system development, testing, and demonstration of on-orbit docking mechanisms. The HiLS test facility of TCS Research Lab will use two URs in which one UR is attached to the RG2 gripper, and the other is attached to a force-torque sensor and with a scaled mock-up model. The first UR5 will be mounted on a 7-axis linear rail and contain the docking probe. First, UR5 with a suitable gripper has to interface its control boxes. The grasping algorithm was run through the ROS interface line to demonstrate and validate the on-orbit operations. The manipulator will be mounted with LIDAR and a camera to visualize the mock-up model, find the target model's pose and rotational velocity estimation, and a gripper that will move relative to the target model. The other manipulator has the UR10 control, providing rotational and random motion to the mockup, enabling a dynamic simulator fed by force-torque data. The dynamic simulator is fed up with the orbit propagator, which will provide the orbiting environment to the target model. For the simulation of the docking and grasping of the target model, a linear rail of a 6m setup is still in the procurement process. Once reaching proximity, the grasping algorithm will be launched to capture the target model after reading the random motion of the mock-up model.
VLSI Model of Primate Visual Smooth Pursuit
Etienne-Cummings, Ralph, Spiegel, Jan Van der, Mueller, Paul
A one dimensional model of primate smooth pursuit mechanism has been implemented in 2 11m CMOS VLSI. The model consolidates Robinson's negative feedback model with Wyatt and Pola's positive feedback scheme, to produce a smooth pursuit system which zero's the velocity of a target on the retina. Furthermore, the system uses the current eye motion as a predictor for future target motion. Analysis, stability and biological correspondence of the system are discussed. For implementation at the focal plane, a local correlation based visual motion detection technique is used. Velocity measurements, ranging over 4 orders of magnitude with 15% variation, provides the input to the smooth pursuit system. The system performed successful velocity tracking for high contrast scenes. Circuit design and performance of the complete smooth pursuit system is presented.
VLSI Model of Primate Visual Smooth Pursuit
Etienne-Cummings, Ralph, Spiegel, Jan Van der, Mueller, Paul
A one dimensional model of primate smooth pursuit mechanism has been implemented in 2 11m CMOS VLSI. The model consolidates Robinson's negative feedback model with Wyatt and Pola's positive feedback scheme, to produce a smooth pursuit system which zero's the velocity of a target on the retina. Furthermore, the system uses the current eye motion as a predictor for future target motion. Analysis, stability and biological correspondence of the system are discussed. For implementation at the focal plane, a local correlation based visual motion detection technique is used. Velocity measurements, ranging over 4 orders of magnitude with 15% variation, provides the input to the smooth pursuit system. The system performed successful velocity tracking for high contrast scenes. Circuit design and performance of the complete smooth pursuit system is presented.
VLSI Model of Primate Visual Smooth Pursuit
Etienne-Cummings, Ralph, Spiegel, Jan Van der, Mueller, Paul
A one dimensional model of primate smooth pursuit mechanism has been implemented in 2 11m CMOS VLSI. The model consolidates Robinson's negative feedback model with Wyatt and Pola's positive feedback scheme, to produce a smooth pursuit system which zero's the velocity of a target on the retina. Furthermore, the system uses the current eye motion as a predictor for future target motion. Analysis, stability and biological correspondence of the system are discussed. For implementation at the focal plane, a local correlation based visual motion detection technique is used. Velocity measurements, ranging over 4 orders of magnitude with 15% variation, provides the input to the smooth pursuit system. The system performed successful velocity tracking for high contrast scenes. Circuit design and performance of the complete smooth pursuit system is presented.